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Abstract 

Nature-inspired algorithms are among the most powerful algorithms for op- 
timization. This paper intends to provide a detailed description of a new Firefly 
Algorithm (FA) for multimodal optimization applications. We will compare the 
proposed firefly algorithm with other metaheuristic algorithms such as particle 
swarm optimization (PSO). Simulations and results indicate that the proposed 
firefly algorithm is superior to existing metaheuristic algorithms. Finally we 
will discuss its applications and implications for further research. 

Citation detail: X.-S. Yang, "Firefly algorithms for multimodal optimiza- 
tion", in: Stochastic Algorithms: Foundations and Applications, SAGA 2009, 
Lecture Notes in Computer Sciences, Vol. 5792, pp. 169-178 (2009). 

1 Introduction 

Biologically inspired algorithms are becoming powerful in modern numerical 
optimization [1, 2, 4, 6, 9, 10], especially for the NP-hard problems such as 
the travelling salesman problem. Among these biology-derived algorithms, the 
multi-agent metaheuristic algorithms such as particle swarm optimization form 
hot research topics in the start-of-the-art algorithm development in optimiza- 
tion and other applications [1, 2, 9]. 

Particle swarm optimization (PSO) was developed by Kennedy and Eber- 
hart in 1995 [5], based on the swarm behaviour such as fish and bird schooling in 
nature, the so-called swarm intelligence. Though particle swarm optimization 
has many similarities with genetic algorithms, but it is much simpler because 
it does not use mutation/crossover operators. Instead, it uses the real-number 
randomness and the global communication among the swarming particles. In 
this sense, it is also easier to implement as it uses mainly real numbers. 

This paper aims to introduce the new Firefly Algorithm and to provide 
the comparison study of the FA with PSO and other relevant algorithms. We 
will first outline the particle swarm optimization, then formulate the firefly 
algorithms and finally give the comparison about the performance of these 
algorithms. The FA optimization seems more promising than particle swarm 
optimization in the sense that FA can deal with multimodal functions more 
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naturally and efficiently. In addition, particle swarm optimization is just a 
special class of the firefly algorithms as we will demonstrate this in this paper. 

2 Particle Swarm Optimization 
2.1 Standard PSO 

The PSO algorithm searches the space of the objective functions by adjusting 
the trajectories of individual agents, called particles, as the piecewise paths 
formed by positional vectors in a quasi-stochastic manner [5, 6]. There are 
now as many as about 20 different variants of PSO. Here we only describe the 
simplest and yet popular standard PSO. 

The particle movement has two major components: a stochastic component 
and a deterministic component. A particle is attracted toward the position of 
the current global best g* and its own best location x* in history, while at 
the same time it has a tendency to move randomly. When a particle finds a 
location that is better than any previously found locations, then it updates it 
as the new current best for particle i. There is a current global best for all n 
particles. The aim is to find the global best among all the current best solutions 
until the objective no longer improves or after a certain number of iterations. 

For the particle movement, we use x* to denote the current best for particle 
i, and g* min or max{/(xj)}(i = 1, 2, n) to denote the current global best. 
Let Xj and Vj be the position vector and velocity for particle i, respectively. 
The new velocity vector is determined by the following formula 

v* +1 = v* + aei (g* - x*) + f3e 2 (x* - x*). (1) 

where e\ and e 2 are two random vectors, and each entry taking the values 
between and 1. The Hadamard product of two matrices u0 v is defined as 
the entrywise product, that is [u© v]y = UijVij. The parameters a and (5 are 
the learning parameters or acceleration constants, which can typically be taken 
as, say, a ps (3 ps 2. The initial values of x* =0 can be taken as the bounds or 
limits a = min(xj), b = max(xj) and v* =0 = 0. The new position can then be 
updated by 

x^xj + vf 1 . (2) 

Although Vj can be any values, it is usually bounded in some range [0,v maa; ]. 

There are many variants which extend the standard PSO algorithm, and the 
most noticeable improvement is probably to use inertia function 9(t) so that v* 
is replaced by 9(t)vj where 9 takes the values between and 1. In the simplest 
case, the inertia function can be taken as a constant, typically 9 ~ 0.5 ~ 0.9. 
This is equivalent to introducing a virtual mass to stabilize the motion of the 
particles, and thus the algorithm is expected to converge more quickly. 
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3 Firefly Algorithm 



3.1 Behaviour of Fireflies 

The flashing light of fireflies is an amazing sight in the summer sky in the 
tropical and temperate regions. There are about two thousand firefly species, 
and most fireflies produce short and rhythmic flashes. The pattern of flashes is 
often unique for a particular species. The flashing light is produced by a process 
of bioluminescence, and the true functions of such signaling systems are still 
debating. However, two fundamental functions of such flashes are to attract 
mating partners (communication), and to attract potential prey. In addition, 
flashing may also serve as a protective warning mechanism. The rhythmic flash, 
the rate of flashing and the amount of time form part of the signal system that 
brings both sexes together. Females respond to a male's unique pattern of 
flashing in the same species, while in some species such as photuris, female 
fireflies can mimic the mating flashing pattern of other species so as to lure and 
eat the male fireflies who may mistake the flashes as a potential suitable mate. 

We know that the light intensity at a particular distance r from the light 
source obeys the inverse square law. That is to say, the light intensity / de- 
creases as the distance r increases in terms of I oc 1/r 2 . Furthermore, the 
air absorbs light which becomes weaker and weaker as the distance increases. 
These two combined factors make most fireflies visible only to a limited dis- 
tance, usually several hundred meters at night, which is usually good enough 
for fireflies to communicate. 

The flashing light can be formulated in such a way that it is associated with 
the objective function to be optimized, which makes it possible to formulate new 
optimization algorithms. In the rest of this paper, we will first outline the basic 
formulation of the Firefly Algorithm (FA) and then discuss the implementation 
as well as its analysis in detail. 

3.2 Firefly Algorithm 

Now we can idealize some of the flashing characteristics of fireflies so as to 
develop firefly- inspired algorithms. For simplicity in describing our new Fireflire 
Algorithm (FA), we now use the following three idealized rules: 1) all fireflies 
are unisex so that one firefly will be attracted to other fireflies regardless of 
their sex; 2) Attractiveness is proportional to their brightness, thus for any 
two flashing fireflies, the less brighter one will move towards the brighter one. 
The attractiveness is proportional to the brightness and they both decrease as 
their distance increases. If there is no brighter one than a particular firefly, 
it will move randomly; 3) The brightness of a firefly is affected or determined 
by the landscape of the objective function. For a maximization problem, the 
brightness can simply be proportional to the value of the objective function. 
Other forms of brightness can be defined in a similar way to the fitness function 
in genetic algorithms. 

Based on these three rules, the basic steps of the firefly algorithm (FA) can 
be summarized as the pseudo code shown in Fig. 1. 
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Firefly Algorithm 

Objective function /(x), x = (x±, ...,Xd) T 

Generate initial population of fireflies Xj {i = 1,2, 

Light intensity Ii at x» is determined by /(xj) 

Define light absorption coefficient 7 

while (t <MaxGeneration) 

for i = 1 : n all n fireflies 
for j = 1 : i all n fireflies 

if (Ij > Ii), Move firefly i towards j in d-dimension; end if 
Attractiveness varies with distance r via exp[— 77-] 
Evaluate new solutions and update light intensity 
end for j 

end for i 

Rank the fireflies and find the current best 
end while 

Postprocess results and visualization 

Figure 1: Pseudo code of the firefly algorithm (FA). 



In certain sense, there is some conceptual similarity between the firefly al- 
gorithms and the bacterial foraging algorithm (BFA) [3, 7]. In BFA, the at- 
traction among bacteria is based partly on their fitness and partly on their 
distance, while in FA, the attractiveness is linked to their objective function 
and monotonic decay of the attractiveness with distance. However, the agents 
in FA have adjustable visibility and more versatile in attractiveness variations, 
which usually leads to higher mobility and thus the search space is explored 
more efficiently. 

3.3 Attractiveness 

In the firefly algorithm, there are two important issues: the variation of light 
intensity and formulation of the attractiveness. For simplicity, we can always 
assume that the attractiveness of a firefly is determined by its brightness which 
in turn is associated with the encoded objective function. 

In the simplest case for maximum optimization problems, the brightness I 
of a firefly at a particular location x can be chosen as I(x) oc /(x). However, 
the attractiveness j3 is relative, it should be seen in the eyes of the beholder or 
judged by the other fireflies. Thus, it will vary with the distance 77,- between 
firefly i and firefly j. In addition, light intensity decreases with the distance 
from its source, and light is also absorbed in the media, so we should allow 
the attractiveness to vary with the degree of absorption. In the simplest form, 
the light intensity J(r) varies according to the inverse square law I(r) = I s /r 2 
where I s is the intensity at the source. For a given medium with a fixed light 
absorption coefficient 7, the light intensity / varies with the distance r. That 
is I = Ioe~ ir , where Iq is the original light intensity. In order to avoid the 
singularity at r = in the expression I s /r 2 , the combined effect of both the 
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inverse square law and absorption can be approximated using the following 
Gaussian form 

I(r)=I e-^ 2 . (3) 

Sometimes, we may need a function which decreases monotonically at a slower 
rate. In this case, we can use the following approximation 

1 + 7 r z 

At a shorter distance, the above two forms are essentially the same. This is 
because the series expansions about r = 

e"^ 2 «l- 7 r 2 + I 7 2 r 4 + ..., — « 1 - 7 r 2 + 7 V + (5) 
2 1 + 77^ 

are equivalent to each other up to the order of 0(r 3 ). 

As a firefly's attractiveness is proportional to the light intensity seen by 
adjacent fireflies, we can now define the attractiveness (3 of a firefly by 

/3(r) = /3oe-< (6) 

where /3q is the attractiveness at r = 0. As it is often faster to calculate 
1/(1 + r 2 ) than an exponential function, the above function, if necessary, can 
conveniently be replaced by f3 = j^pi- Equation (6) defines a characteristic 
distance T = 1/ a 7t" over which the attractiveness changes significantly from (5q 
to /3 e _1 . 

In the implementation, the actual form of attractiveness function fi{r) can 
be any monotonically decreasing functions such as the following generalized 
form 

/3(r) = /3 e-^ m , (m > 1). (7) 

For a fixed 7, the characteristic length becomes V = 7 _1 / m — y 1 as m — > 00. 
Conversely, for a given length scale T in an optimization problem, the parameter 
7 can be used as a typical initial value. That is 7 = |4r- 

3.4 Distance and Movement 

The distance between any two fireflies i and j at Xj and x^, respectively, is the 
Cartesian distance 



a ( X i,k ~ x j,k) 2 , (8) 
\ k=l 

where x^k is the kth component of the spatial coordinate Xj of ith firefly. In 
2-D case, we have = (xi — Xj) 2 + (yi — Vj) 2 - 

The movement of a firefly i is attracted to another more attractive (brighter) 
firefly j is determined by 

2 1 
Xj = Xj + f3 e ir v (xj - x^ + a (rand - -), (9) 
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where the second term is due to the attraction while the third term is random- 
ization with a being the randomization parameter, rand is a random number 
generator uniformly distributed in [0, 1]. For most cases in our implementation, 
we can take (3q = 1 and a G [0, 1]. Furthermore, the randomization term can 
easily be extended to a normal distribution N(0, 1) or other distributions. In 
addition, if the scales vary significantly in different dimensions such as — 10 5 to 
10 5 in one dimension while, say, —0.001 to 0.01 along the other, it is a good 
idea to replace a by aSk where the scaling parameters S)~(k = 1, d) in the d 
dimensions should be determined by the actual scales of the problem of interest. 

The parameter 7 now characterizes the variation of the attractiveness, and 
its value is crucially important in determining the speed of the convergence and 
how the FA algorithm behaves. In theory, 7 G [0, 00), but in practice, 7 = 0(1) 
is determined by the characteristic length V of the system to be optimized. 
Thus, in most applications, it typically varies from 0.01 to 100. 

3.5 Scaling and Asymptotic Cases 

It is worth pointing out that the distance r defined above is not limited to 
the Euclidean distance. We can define many other forms of distance r in the 
n-dimensional hyperspace, depending on the type of problem of our interest. 
For example, for job scheduling problems, r can be defined as the time lag 
or time interval. For complicated networks such as the Internet and social 
networks, the distance r can be defined as the combination of the degree of 
local clustering and the average proximity of vertices. In fact, any measure 
that can effectively characterize the quantities of interest in the optimization 
problem can be used as the 'distance' r. The typical scale T should be associated 
with the scale in the optimization problem of interest. If T is the typical scale 
for a given optimization problem, for a very large number of fireflies n 3> m 
where m is the number of local optima, then the initial locations of these n 
fireflies should distribute relatively uniformly over the entire search space in 
a similar manner as the initialization of quasi-Monte Carlo simulations. As 
the iterations proceed, the fireflies would converge into all the local optima 
(including the global ones) in a stochastic manner. By comparing the best 
solutions among all these optima, the global optima can easily be achieved. 
At the moment, we are trying to formally prove that the firefly algorithm will 
approach global optima when n — > 00 and t » 1, In reality, it converges 
very quickly, typically with less than 50 to 100 generations, and this will be 
demonstrated using various standard test functions later in this paper. 

There are two important limiting cases when 7—7-0 and 7 — > 00. For 7—7-0, 
the attractiveness is constant j3 = (3o and r — > 00, this is equivalent to say 
that the light intensity does not decrease in an idealized sky. Thus, a flashing 
firefly can be seen anywhere in the domain. Thus, a single (usually global) 
optimum can easily be reached. This corresponds to a special case of particle 
swarm optimization (PSO) discussed earlier. Subsequently, the efficiency of 
this special case is the same as that of PSO. 

On the other hand, the limiting case 7—7-00 leads to T — > and /3(r) — > 
5(r) (the Dirac delta function), which means that the attractiveness is almost 
zero in the sight of other fireflies or the fireflies are short-sighted. This is 
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Figure 2: Michalewicz's function for two independent variables with a global minimum 
f„ w -1.801 at (2.20319,1.57049). 

equivalent to the case where the fireflies fly in a very foggy region randomly. 
No other fireflies can be seen, and each firefly roams in a completely random 
way. Therefore, this corresponds to the completely random search method. As 
the firefly algorithm is usually in somewhere between these two extremes, it is 
possible to adjust the parameter 7 and a so that it can outperform both the 
random search and PSO. In fact, FA can find the global optima as well as all the 
local optima simultaneously in a very effective manner. This advantage will be 
demonstrated in detail later in the implementation. A further advantage of FA 
is that different fireflies will work almost independently, it is thus particularly 
suitable for parallel implementation. It is even better than genetic algorithms 
and PSO because fireflies aggregate more closely around each optimum (without 
jumping around as in the case of genetic algorithms). The interactions between 
different subregions are minimal in parallel implementation. 

4 Multimodal Optimization with Multiple 
Optima 

4.1 Validation 

In order to demonstrate how the firefly algorithm works, we have implemented it 
in Matlab. We will use various test functions to validate the new algorithm. As 
an example, we now use the FA to find the global optimum of the Michalewicz 
function 

/(x) = -£sin(^[sin(^)] 2 ™, (10) 

where m = 10 and d = 1, 2, .... The global minimum /* « —1.801 in 2-D occurs 
at (2.20319,1.57049), which can be found after about 400 evaluations for 40 
fireflies after 10 iterations (see Fig. 2 and Fig. 3). Now let us use the FA to 
find the optima of some tougher test functions. This is much more efficient 
than most of existing metaheuristic algorithms. In the above simulations, the 
values of the parameters are a = 0.2, 7=1 and /?o = 1- 

We have also used much tougher test functions. For example, Yang de- 
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ure 3: The initial 40 fireflies (left) and their locations after 10 iterations (rig! 



scribed a multimodal function which looks like a standing- wave pattern [11] 

/(x) = [ e -Sti(^/«) 2m -2e-^ti^] • rjcos 2 Xi, m = 5, (11) 

i=l 

is multimodal with many local peaks and valleys, and it has a unique global 
minimum /* = —1 at (0, 0, 0) in the region —20 < X{ < 20 where i = 1,2, <i 
and a = 15. The 2D landscape of Yang's function is shown in Fig. 4. 



4.2 Comparison of FA with PSO and GA 

Various studies show that PSO algorithms can outperform genetic algorithms 
(GA) [4] and other conventional algorithms for solving many optimization prob- 
lems. This is partially due to that fact that the broadcasting ability of the cur- 
rent best estimates gives better and quicker convergence towards the optimality. 
A general framework for evaluating statistical performance of evolutionary al- 
gorithms has been discussed in detail by Shilane et al. [8]. Now we will compare 
the Firefly Algorithms with PSO, and genetic algorithms for various standard 
test functions. For genetic algorithms, we have used the standard version with 
no elitism with a mutation probability of p m = 0.05 and a crossover probability 
of 0.95. For the particle swarm optimization, we have also used the standard 
version with the learning parameters a ~ (3 ~ 2 without the inertia correction 
[4, 5, 6]. We have used various population sizes from n = 15 to 200, and found 
that for most problems, it is sufficient to use n = 15 to 50. Therefore, we have 
used a fixed population size of n = 40 in all our simulations for comparison. 

After implementing these algorithms using Matlab, we have carried out ex- 
tensive simulations and each algorithm has been run at least 100 times so as to 
carry out meaningful statistical analysis. The algorithms stop when the varia- 
tions of function values are less than a given tolerance e < 10 -5 . The results 
are summarized in the following table (see Table 1) where the global optima 
are reached. The numbers are in the format: average number of evaluations 
(success rate), so 3752 ± 725(99%) means that the average number (mean) of 
function evaluations is 3752 with a standard deviation of 725. The success rate 
of finding the global optima for this algorithm is 99%. 

We can see that the FA is much more efficient in finding the global optima 
with higher success rates. Each function evaluation is virtually instantaneous 
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20 20 

Figure 4: Yang's function in 2D with a global minimum /* = —1 at (0,0) where 
a = 15. 

Table 1: Comparison of algorithm performance 
Functions /Algorithms GA PSO FA 



Michalewicz's (d=16) 
Rosenbrock's (d=lQ) 
De Jong's (d=256) 
Schwefel's (d=128) 
Ackley's (d=128) 
Rastrigin's 

Easom's 
Griewank's 
Shubert's (18 minima) 
Yang's (d = 16) 



89325 ± 7914(95%) 
55723 ±8901(90%) 
25412 ± 1237(100%) 
227329 ± 7572(95%) 
32720 ± 3327(90%) 
110523 ±5199(77%) 
19239 ± 3307(92%) 
70925 ± 7652(90%) 
54077 ± 4997(89%) 
27923 ± 3025(83%) 



6922 ± 537(98%) 
32756 ± 5325(98%) 
17040 ± 1123(100%) 
14522 ± 1275(97%) 
23407 ± 4325(92%) 
79491 ± 3715(90%) 
17273 ± 2929(90%) 
55970 ± 4223(92%) 
23992 ± 3755(92%) 
14116 ±2949(90%) 



3752 ± 725(99%) 
7792 ± 2923(99%) 
7217 ± 730(100%) 
9902 ± 592(100%) 
5293 ±4920(100%) 
15573 ±4399(100%) 
7925 ± 1799(100%) 
12592 ± 3715(100%) 
12577 ±2356(100%) 
7390 ± 2189(100%) 



on modern personal computer. For example, the computing time for 10,000 
evaluations on a 3GHz desktop is about 5 seconds. Even with graphics for 
displaying the locations of the particles and fireflies, it usually takes less than 
a few minutes. It is worth pointing out that more formal statistical hypothesis 
testing can be used to verify such significance. 



5 Conclusions 

In this paper, we have formulated a new firefly algorithm and analyzed its simi- 
larities and differences with particle swarm optimization. We then implemented 
and compared these algorithms. Our simulation results for finding the global 
optima of various test functions suggest that particle swarm often outperforms 
traditional algorithms such as genetic algorithms, while the new firefly algo- 
rithm is superior to both PSO and GA in terms of both efficiency and success 
rate. This implies that FA is potentially more powerful in solving NP-hard 
problems which will be investigated further in future studies. 

The basic firefly algorithm is very efficient, but we can see that the solutions 
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are still changing as the optima are approaching. It is possible to improve the 
solution quality by reducing the randomness gradually. A further improvement 
on the convergence of the algorithm is to vary the randomization parameter a 
so that it decreases gradually as the optima are approaching. These could form 
important topics for further research. Furthermore, as a relatively straightfor- 
ward extension, the Firefly Algorithm can be modified to solve multiobjective 
optimization problems. In addition, the application of firefly algorithms in com- 
bination with other algorithms may form an exciting area for further research. 
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